-
Notifications
You must be signed in to change notification settings - Fork 144
Selector validation #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selector validation #162
Conversation
…racters should be covered now)
…tead aborting the parsing process. This seems to match the browsers' behavior
…ch the browsers' parsing behavior
Thanks! I did start work on proper selector parsing some time ago but did not have time to finish it. So for now this will have to do. Does it cover cases like: .this-selector [should="be-{"] .valid {
} and .this-selector /* should remain-} */ .valid {
} ? Also see the ignored test case in https://github.com/sabberworm/PHP-CSS-Parser/blob/a90142fa0c664db18ea01948a61466bf1ff79220/tests/files/-tobedone.css#L1. |
Just tested how Chrome behaves in these cases and the first example is actually invalid. The second one, however, should be valid but it is currently not. I will try to fix this soon. |
No it isn’t. See https://codepen.io/anon/pen/GazJYb Works flawlessly on Firefox and Chrome. |
Just pushed an update which resolves the pending issues. Please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think this looks good but I do have some reservation about this huge regex that is too complicated. Could you maybe make that more readable?
public function __construct($sSelector, $bCalculateSpecificity = false) { | ||
if (!Selector::isValid($sSelector)) { | ||
throw new UnexpectedTokenException("Selector did not match '" . self::SELECTOR_VALIDATION_RX . "'.", $sSelector, "custom"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure it is the job of the Selector
class to validate its input. I think it’s the job of the selector parsing logic (i.e. DeclarationBlock::parse
) to do that.
@@ -35,10 +37,21 @@ class Selector { | |||
)) | |||
/ix'; | |||
|
|||
const SELECTOR_VALIDATION_RX = '/ | |||
^((?:[a-zA-Z0-9\x{00A0}-\x{FFFF}_\^\$\|\*\=\"\'\~\[\]\(\)\-\s\.:#\+\>]*(?:\\\\.)?(?:\'.*?\')?(?:\".*?\")?)*|\s*?[\+-]?\d+\%\s*)$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex is too complicated to understand. If it can’t be simplified, maybe a regex isn’t the right tool for the job. But since the end goal for selectors still is a complete parser (which will make the regex obsolete), I’ll allow it for now. But maybe you could split the regex to multiple lines using concatenation and comment each line so we better understand what it does.
Also, some of the escapes are unnecessary. Inside character classes []
, you only need to escape ]
and -
(in some cases) (and maybe [
to be symmetrical), but ^
, $
, |
, *
, =
, "
, ~
, +
, (
, )
can be left literal ('
was already literal since \'
is a string escape, not a regex escape).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me.
The way we currently detect extra closing block brackets
}
is not perfect and still causes issues, because we just ignore them. This however does not match the browsers' behavior. Depending on the case the extra closing brackets might end up as part of the selector. Take a look at the following:The body selector is missing an opening
{
so the selector string will be everything up to the next{
found at the first#test {
declaration. This makes the selector invalid, so the first definition of#test
must be skipped, but parsing must continue. The final tree must include the@keyframes
, first#test
definition and the last#test
definition. The rules in between must be skipped.This is why I think we should have a selector validation and skip the rule sets for invalid selectors (which matches the browsers' behavior).
The modifications in this PR will change the output of the library, so this change might be considered BC breaking, but I believe it is a move in the right direction. I tried to build the selector validation according to the definition found in this document. More specifically this line
In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_);
. It is not perfect yet, because it doesn't check for escaped characters, or a valid start of the selector. I might try to add these if we agree that this is a viable approach to the problem.Let me know what you think.